Random Forests for Big Data
نویسندگان
چکیده
منابع مشابه
Random Forests for Big Data
Big Data is one of the major challenges of statistical science and has numerous consequences from algorithmic and theoretical viewpoints. Big Data always involve massive data but they also often include data streams and data heterogeneity. Recently some statistical methods have been adapted to process Big Data, like linear regression models, clustering methods and bootstrapping schemes. Based o...
متن کاملTraining Big Random Forests with Little Resources
Without access to large compute clusters, building random forests on large datasets is still a challenging problem. This is, in particular, the case if fully-grown trees are desired. We propose a simple yet effective framework that allows to efficiently construct ensembles of huge trees for hundreds of millions or even billions of training instances using a cheap desktop computer with commodity...
متن کاملBig Data Analytics framework for Peer-to-Peer Botnet detection using Random Forests
Network traffic monitoring and analysis-related research has struggled to scale for massive amounts of data in real time. Some of the vertical scaling solutions provide good implementation of signature based detection. Unfortunately these approaches treat network flows across different subnets and cannot apply anomaly-based classification if attacks originate from multiple machines at a lower s...
متن کاملRandom survival forests for high-dimensional data
Minimal depth is a dimensionless order statistic that measures the predictiveness of a variable in a survival tree. It can be used to select variables in high-dimensional problems using Random Survival Forests (RSF), a new extension of Breiman’s Random Forests (RF) to survival settings. We review this methodology and demonstrate its use in high-dimensional survival problems using a public domai...
متن کاملA Random Sample Partition Data Model for Big Data Analysis
Big data sets must be carefully partitioned into statistically similar data subsets that can be used as representative samples for big data analysis tasks. In this paper, we propose the random sample partition (RSP) to represent a big data set as a set of non-overlapping data subsets, i.e. RSP data blocks, where each RSP data block has the same probability distribution with the whole big data s...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Big Data Research
سال: 2017
ISSN: 2214-5796
DOI: 10.1016/j.bdr.2017.07.003